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SUBSTITUTE SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: GENSET SA 

(B) STREET: 24 RUE ROYALE 

(C) CITY: PARIS 

(E) COUNTRY: FRANCE 

(F) POSTAL CODE: 75008 

(ii) TITLE OF INVENTION: HUMAN DBFENSIN DEF-X GENE AND DNAc COMPOSITION 
CONTAINING SAME AND DIAGNOSTIC AND THERAPEUTIC APPLICATIONS 

(iii) NUMBER OF SEQUENCES: 6 

(iv) CORRESPONENCE ADDRESS: 

(A) ADDRESSEE: Saliwanchik, Lloyd & Saliwanchik 

(B) STREET: 2421 N.W. 41*'' Street, Suite A-1 

(C) CITY: Gainesville 

(D) STATE: Florida 

(E) COUNTRY: USA 

(F) ZIP: 32606 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERARATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0, Version #1.30 (EPO) 
(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER (unassigned) 

(B) FILING DATE: OCTOBER 18, 2001 

(vii) PRIORITY APPLICATION DATA: 

(A) APPLICATION NUMBER 09/486,580 

(B) FILING DATE: FEBRUARY 25, 2000 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Frank C. Eisenschenk, Ph.D. 

(B) REGISTRATION NUMBER: 45,332 

(C) REFERENCE /DOCKET NUMBER: GEN-lOODl 

(2) INFORMATION FOR SEQ ID NO: 1: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 415 BASE PAIRS 

(B) TYPE: NUCLEOTIDE 

(C) STRANDEDNESS : DOUBLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: Exon 1 

(B) LOCATION: 1836.. 1874 
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(ix) FEATURE: 

(A) NAME/KEY: Exon 2 

(B) LOCATION: 3394.-3577 

(ix) FEATURE: 

(A) NAME/KEY: Exon 3 

(B) LOCATION: 4161., 4380 

(ix) FEATURE: 

(A) NAME/KEY: start CDS 

(B) LOCATION: 3406.. 3408 

(ix) FEATURE: 

(A) NAME/KEY: Stop CDS 

(B) LOCATION: 4276.-4278 

(ix) FEATURE: 

(A) NAME/KEY: polyadenylat ion site 

(B) LOCATION: 4374.. 4379 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



m 



ACACCATTTG 
GCCTAGGTCA 
CCTGTGCGTA 
TGTGTTACTG 
GGTTTCACTT 
CACACACCCC 
TAATGCCCTA 
TTCCTCTGCT 
ACAGATCAGG 
GACCACCCCT 
AGCTCTCACT 
CTGCAGGGGN 
TGGCAACAGT 
TCCCTCAGGA 
TATGAGGAGG 
TCCCTAGGAC 
CCAAGCATAN 
TGAACTGGCT 
CCCTCCCTTG 



TCTTCATGTA 
CACCATGAGG 
ACAACATCAC 
AGGAAATGCC 
CTGCAGGACA 
TTCTCATTTT 
GAACCTAAAA 
GGAATGAGTC 
AACTCACTGC 
AGCGAGGCTT 
CCACTGCCCC 
AGGTCCTGTG 
GGCTGCCCGG 
GTGACTGCAT 
AGGAGGAGGA 
TCCCCCTCAA 
GAGTAATCAT 
TTAGAACAAG 
TGAGACCTCT 



ACCCCATTAG 
CTGCNCTTAC 
ACNCCAAATT 
TGTGGATTGG 
CTGGACGTTT 
GCCTCTACAT 
CCATCATCTG 
CAGTGCCCAC 
TTCCTCATAG 
GAGATGCCTC 
AAGTCCTCCA 
TATCCGGCCT 
CCTGCACACT 
TCTTTTCCCA 
GGAGGGTGGA 
ATAACCCAGG 
CCCACTCATG 
GTGTTTGAGC 
GAGACACATT 



CTATACCCTC 
AAGTTATGCA 
TAACCAGCTC 
AGTGTGTTCT 
CCCAAAACCA 
CCATATCCAC 
GGGCCCAGTT 
TTCCTCCAAC 
GGGCAGCCGA 
TTGCCTCCTT 
CAGCGCGGTG 
GCTGGACCAG 
GGGCTTGGCA 
TTTCCAGAAA 
GAGTGGTACA 
AGGGACCATA 
CTGAGTGTAT 
ACACAGCACC 
NAGGTCTCAC 



TAGTGCAAGG 
AAAACTATGG 
TCCCCATAAC 
GTGTGCAGGA 
GCAGACTTTC 
TGGGCCCTTC 
CCCTGAATGG 
GGTGAAATTG 
CTTCACTGCT 
AAGACTGAGG 
CCTGCTGCCT 
CGCTGTGCAC 
ACCTCGCTGT 
ACTGATGCCA 
TTTTAAAATG 
CCAGCTCATT 
GGTGGCCATT 
GTCTTGCTGC 
CTAAAAATCT 



AAACCATAGG 
ACTTGGGAGA 
AGCACGCTCA 
GGCTGGTCCA 
CCCACGTGCA 
AGGCACCTAC 
CCCTAATCTC 
CTGGGCTGCT 
CTGCAACAGC 
GAGACGCTTC 
TCACACAGAG 
AACCCTCCCA 
AGGTATTTAT 
TTTACCTCAC 
TGCACTATTC 
CCTGTGTATC 
AAGCCTGCCC 
CACCTTGGCC 
CAGGATTTCT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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AGGCCCAAAN CGGTCCTAAA AAATTGTTCA GTCTGAACTC TCTAAGGTCA AGAGAAGAGG 1200 

TGGTTGCTCC CTCTAAGAAA CCACATGTTG CATGTACATC CTTAATTCCG GAAAGTCCAA 12 6 0 

CAAACCTGCC CTGCTTAGCA ACACAAGCCG AGGTGGTACT CCTCTCACCC GGGCATTCTC 1320 

CAACACACCT GTTTGTCCAA ACAGCTTTGA TTTGTTTTTA TAGTTGGACC CCAGGTTCCC 1380 

AGGAGGCTGG TTCAGGCCAT ATTCCAAATC CTCATCTGTG TGTGAGTGGC ATTCTTAGCC 1440 

TAGCCTCCTT ACAGGGTGGA TACTATGATA CACAGCCAGG CTGTCCCAGT GGCTTTCAAT 1500 

ATTCTTTTGG TCCAGATAGT TCAGCCTCAG CACCAGTGTA GGCATCACAG GGTCAATTGT 1560 

CTTAGGAGTC ATGGAGAATT CATAGTTGGT AGCTACCTGG GCCTGGCCAG GGCTGACCAT 162 0 

AGACAAGGCA TCCCTCTGTG AACTCCTATT TTAATGCCAG CTTCCCAACA AATTTCTCAA 1680 

CTGCTCTTAC CAGCAGGTAT TTAAACTACT CAATAGAAAG TAACCCTGAA AATTAGGACA 174 0 

CCTGTTCCCA AAAGACCCTT AAATAGGGGA AGTCCTTTCN CTGCTTGTGC ACAGCTGCTG 180 0 

ATGTGGCAAC ATGAGGCCTG GGACAGGGGA CTGTCCTCTG CCCACTCTGG TAGCCTCACG 186 0 

TAGCTTAACA ATCTGTCAGT AATACAATAC AAAACTTAAA CTTTCATACT GCGGTTCCAC 192 0 

CCAGGAAGCT GTGTTCCCAA TCTGACCCGT GATTATGGGG CCACCTCAGA GGGNACCCAG 198 0 

TGAGGGAATA TTTTGCCATC TGGGACTGTT GGTTGCTGGG GGCAGTGGCT ATGAGCTCAG 204 0 

TTAATAAACT CAAGCAGTTT CCTTCCAAAC ACACATGTCC TACTTAACGT GTCCAACAGA 2100 

GATGATCATA CTCATANGCT GCTAAAACAT TANTTTTATT TTGAGAAAAG TCTATTCATG 2160 

TTCTTGGCCC ATGGAGTTTT CATTTNATTA NTTTATTTAT TTTGCAGAGA TGGAGTCTCA 222 0 

CTATGTTGCT CAAGCTGGTC TCCAACTCCT GGGCTCAAGC GATCTTCCTA CTTTGGCCTT 22 80 

TGAAAGCGCT GAGATTGCCT GTGTGAGCCA TCATGGGGGC TCACTGGCCC ACTGATTAAT 234 0 

CAGATTAATT GTTTTTTGCT ATTGAANTTG TTTGACTTCC TTGTATATTC GGATATTTAC 24 00 

CCATTCTAAC ACGTAGGGTT TGCAAATATT TTCTCTCATG TTCTGTGTTG CCTTTTCACT 246 0 

CAGTTGATGG TTTCCTTTGC TGTGCAGGTG CTTTAGTGTT CAACGCAGCC CCGCTTGTCT 252 0 

ATTTTCCATT TTATTGCCTG TCCCTTTGAT GTCATAGCCA AGAAATAATT GCCCAGATTA 258 0 

ATGTCAAAAA GCTTTATCCC TATATATTCT TCTAGTAGTT TATGGTTTCA GATCTTATGT 264 0 

TTAGGTCTTC AATCCATTGA GTTGATTTTT GTATGTGGTA TAAGAAAAAA GACCACATGT 27 00 

ATACATATCT CAAATTCTAA GGTAGTATAT ATTAGACACA TACAATGTGT CTATTTACAC 276 0 

ACATTGAGCT GAAAATAATA AACATATTTT TATCTTTCAA TCAACTCTAT CTCTATCTCA 2820 

CTGAACTTGT TTCACCTATA GCCTGATGAG GTTGCTGTCC TCTCTACCCC AGCTCCTATA 2880 

GGAGACTGCT CATCCCCTAA CCTCAAAAAC CCCTTCATGA GGGTGATAAT GCCCTTGAAT 2 94 0 

CCTGCAATGA ATTAGTTCTC TACTACAGTG GAATTCAGGT CTGTTATGAG GGTCTGGATC 3 000 



n 



•Scar 
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TCTGAAGAGA AGAGCTCTCA TTTTCAGAAA ATAAGCAGGA TTTATTCCCT GAAATTACTG 3 060 

AATTAAATCA CTGTTTCGAT TACTTTTTGC AATATTAAAA GTAAATATTT AAACAGGTAA 3120 

AAACAGAAAT AATGGTAGGG TCCTTATCAT CACCGTGAAT TCCAAGCTAG CATAGACACT 3180 

AAACCTAGAG ATTCACACTA GAATGAAAGC TGGGAGAGCA GAGGAGTCTC AGAAGGATGT 3240 

GGAGGCCAAT GGACACCTGC AACCTCTCCA ACGAAATGCC TACCTCCTCT CACTGCAGCA 33 00 

TCCATCTCTG AGCCTTCTCG CAGCAGAGCT ATAAATTCAG CCTGGCTCCT CCGTTCCCAC 336 0 

ACATCCACTC CTGCTCTCCC TCCTCTCCTC CAGGTGACTA CAGTTATGAG GACCCTCACC 342 0 

CTCCTCTCTG CCTTTCTCCT GGTGGCCCTT CAGGCCTGGG CAGAGCCGCT CCAGGCAAGA 34 80 

GCTCATGAGA TGCCAGCCCA GAAGCAGCCT CCAGCAGATG ACCAGGATGT GGTCATTTAC 354 0 

TTTTCAGGAG ATGACAGCTG CTCTCTTCAG GTTCCAGGTG AGAGATGCCA GCATGCAGAG 3600 

CTACAGACTA GACAGAAGGA CAGGAGACAG GCTCTGGAAT TGGATCTCAG TGGCAGATGT 366 0 

CACTTAGGTG GCTATACTTA ACATCTCTGG TCCTGGATTT TCTCATATCT AAATGGAATA 3720 

GAGAACCAAA GAAATCTAAG AGATTTTTCT TTCTCCAAAA ACTTGATTCC AAGATATGAC 3780 

III TGTGAAATTC ACTAGATTTA AGATATAAGG AGATGCTACC TAGTTCCTTC TGGAGCCAGA 384 0 

CAAACAAGCT TAAGTATATA GGAAAATATT TCACCCTGTC TATATAGGAG GTTTTAGAAC 3900 

Q CTGGAGAGGA GCCTAAGAAT GTGTTCAGGT GTGTGTGTGA TGGGCAGGAA TGCAGAAAAG 396 0 

'f_ TGAAGCAAAG GAGAATGAGT CTCGAATCCT GTGTGACCAG CACTGCTCTG TGTATTTATT 4 02 0 

Q CCTATTGACT GAGATTGTTT GTGCTACCGG CTGTAATACA GCCAACATCA CTCATCAGCC 4 08 0 

^ AACATGTGAC TTCTCCAAGA TTCCCTTTAC CACCCACTGC TGNACCCCGT ACTCAGTTTC 414 0 

TGATGCTCTC TCTGGGTCCC CAGGCTCAAC AAAGGGCTTG ATCTGCCATT GCAGAGTACT 42 00 

ATACTGCATT TTTGGAGAAC ATCTTGGTGG GACCTGCTTC ATCCTTGGTG AACGCTACCC 4260 

AATCTGCTGC TACTAAGCTT GCAGACTAGA GAAAAAGAGT TCATAATTTT CTTTGAGCAT 4320 

TAAAGGGAAT TGTTATTCTT ATACCTTGTC CTCGATTTCC TGTCCTCATC CCAAATAAAT 4380 

ACTTGGTAAC ATGATTTCCG GGTTTTTTTT TTTTT 4415 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTIC: 

(A) LENGTH: 453 BASE PAIRS 

(B) TYPE: NUCLEOTIDE 

(C) STRANDEDNESS : DOUBLE 

(D) TOPOLOGY: LINEAR 

(li) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 2: 



CTCTGCCCAC TCTGGTAGCC TCACGTAGCT TAACAATCTG TGACTACAGT T ATG AGG 57 

Met Arg 
1 

ACC CTC ACC CTC CTC TCT GCC TTT CTC CTG GTG GCC CTT CAG GCC TGG 105 
Thr Leu Thr Leu Leu Ser Ala Phe Leu Leu Val Ala Leu Gin Ala Trp 
5 10 15 

GCA GAG CCG CTC CAG GCA AGA GCT CAT GAG ATG CCA GCC CAG AAG CAG 153 
Ala Glu Pro Leu Gin Ala Arg Ala His Glu Met Pro Ala Gin Lys Gin 
20 25 30 

OCT CCA GCA GAT GAC CAG GAT GTG GTC ATT TAG TTT TCA GGA GAT GAC 201 
Pro Pro Ala Asp Asp Gin Asp Val Val He Tyr Phe Ser Gly Asp Asp 
35 40 45 50 

AGC TGC TCT CTT CAG GTT CCA GGC TCA ACA AAG GGC TTG ATC TGC CAT 24 9 

Ser Cys Ser Leu Gin Val Pro Gly Ser Thr Lys Gly Leu He Cys His 
55 60 65 



TGC AGA GTA CTA TAG TGC ATT TTT GGA GAA CAT CTT GGT GGG ACC TGC 297 
p Cys Arg Val Leu Tyr Cys He Phe Gly Glu His Leu Gly Gly Thr Cys 

70 75 80 



TTC ATC CTT GGT GAA CGC TAC CCA ATC TGC TGC TAG TAA GCTTGCAGAC 346 
Phe He Leu Gly Glu Arg Tyr Pro He Cys Cys Tyr * 
ffl 85 90 95 



TAGAGAAAAA GAGTTCATAA TTTTCTTTGA GCATTAAAGG GAATTGTTAT TCTTATACCT 4 06 
TGTCCTCGAT TTCCTGTCCT CATCCCAAAT AAATACTTGG TAACATG 453 



W (2) INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: SIGNAL PEPTIDE 

(B) LOCATION: 1..19 

(ix) FEATURE: 

(A) NAME/KEY: PRO REGION 

(B) LOCATION: 20. .63 

(ix) FEATURE: 

(A) NAME/KEY: MATURE PEPTIDE 
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(B) LOCATION: 64 . . 94 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Arg Thr Leu Thr Leu Leu Ser Ala Phe Leu Leu Val Ala Leu Gin 
15 10 15 

Ala Trp Ala Glu Pro Leu Gin Ala Arg Ala His Glu Met Pro Ala Gin 
20 25 30 

Lys Gin Pro Pro Ala Asp Asp Gin Asp Val Val He Tyr Phe Ser Gly 
35 40 45 

Asp Asp Ser Cys Ser Leu Gin Val Pro Gly Ser Thr Lys Gly Leu He 
50 55 60 

Cys His Cys Arg Val Leu Tyr Cys He Phe Gly Glu His Leu Gly Gly 
65 70 75 80 

Thr Cys Phe He Leu Gly Glu Arg Tyr Pro He Cys Cys Tyr 
85 90 

(2) INFORMATION FOR SEQ ID NO: 4: 

m 

%^ (i) SEQUENCE CHARACTERISTICS: 

iZ (A) LENGTH: 19 AMINO ACIDS 

PI (B) TYPE: AMINO ACID 

"W 
!S 

1^1: 

■ ^-t 



(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: SIGNAL PEPTIDE 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met Arg Thr Leu Thr Leu Leu Ser Ala Phe Leu Leu Val Ala Leu Gin 
15 10 15 

Ala Trp Ala 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PRO REGION 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Glu Pro Leu Gin Ala Arg Ala His Glu Met Pro Ala Gin Lys Gin Pro 

15 10 15 

Pro Ala Asp Asp Gin Asp Val Val He Tyr Phe Ser Gly Asp Asp Ser 
20 25 30 

Cys Ser Leu Gin Val Pro Gly Ser Thr Lys Gly Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(11) MOLECULE TYPE: MATURE PEPTIDE 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



He Cys His Cys Arg Val Leu Tyr Cys He Phe Gly Glu His Leu Gly 
15 10 15 

Gly Thr Cys Phe He Leu Gly Glu Arg Tyr Pro He Cys Cys Tyr 
20 25 30 



S *vSHARPSequences"GENMOODl\SequenceList doc/T)\B/jaj 



